Rejection Threshold Estimation for an Unknown Language Model in an OCR Task
نویسندگان
چکیده
In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically “fix” the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.
منابع مشابه
Semi-Blind Channel Estimation based on subspace modeling for Multi-user Massive MIMO system
Channel estimation is an essential task to fully exploit the advantages of the massive MIMO systems. In this paper, we propose a semi-blind downlink channel estimation method for massive MIMO system. We suggest a new modeling for the channel matrix subspace. Based on the low-rankness property, we have prposed an algorithm to estimate the channel matrix subspace. In the next step, using o...
متن کاملImproved utterance rejection using length dependent thresholds
In this paper, we propose to use an utterance length (duration) dependent threshold for rejecting an unknown input utterance with a general speech (garbage) model. A general speech model, comparing with more sophisticated anti-subword models, is a more viable solution to the utterance rejection problem for low-cost applications with stringent storage and computational constraints. However, the ...
متن کاملJust Noticeable Difference Estimation Using Visual Saliency in Images
Due to some physiological and physical limitations in the brain and the eye, the human visual system (HVS) is unable to perceive some changes in the visual signal whose range is lower than a certain threshold so-called just-noticeable distortion (JND) threshold. Visual attention (VA) provides a mechanism for selection of particular aspects of a visual scene so as to reduce the computational loa...
متن کاملWord Segmentation for Urdu OCR System
This paper presents a technique for Word segmentation for the Urdu OCR system. Word segmentation or word tokenization is a preliminary task for understanding the meanings of sentences in Urdu language processing. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A me...
متن کاملAn Estimation of Laffer Curve in Iran: A Non-Linear Approach
Laffer curve indicates relationship between tax rate and tax income. The aim of this paper is estimating of laffer curve in Iranian economy. To do so, we have used threshold regression method. Empirical results indicate that since the tax rate is low (the threshold value is less than 0.0848) in two-regime model, tax rate and tax income have a significant positive relationship, but when the tax ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010